Translation system, dictionary updating server, translation method, and program and recording medium for use therein

ABSTRACT

It is an object of the present invention to provide a translation system capable of preventing translation accuracy from being decreased by an increase of new phrases or the like.  
     [Constitution] 
     A translation system for translating a document comprises a dictionary management unit for managing a plurality of categorized dictionaries classified according to predetermined categories, a phrase extraction unit for extracting a noun phrase from the document, a registration category selection unit for selecting a category on which the extracted noun phrase should be registered among a plurality of categories corresponding to the plurality of categorized dictionaries, respectively, a translation unit for translating the noun phrase to generate a noun phrase translation which is a translation of the noun phrase, and a dictionary registration unit for registering a pair of the noun phrase and the noun phrase translation on the categorized dictionary corresponding to the category selected by the registration

1. TITLE OF THE INVENTION

[0001] Translation system, dictionary updating server, translationmethod, and program and recording medium for use therein

2. DETAILED DESCRIPTION OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates to a translation system, adictionary updating server, a translation method, and a program andrecording medium for use in the system, server and method. Moreparticularly, the present invention relates to a translation system, adictionary updating server and a translation method with a translationdictionary used for translation of documents being automaticallyupdated, and to a program and recording medium for use in the system,server and method.

[0004] 2. Background Art

[0005] Three techniques described below have been disclosed as atechnique usable for the purpose of improving the accuracy oftranslation in a translation system for translating documents.

[0006] The first disclosed technique is a method in which charactersequences for headwords of a dictionary are generated on the basis ofwords designated as objects to be manipulated, and are entered in thedictionary (see patent document 1).

[0007] The second disclosed technique is a method in which data on therelationship between original words and translations of the words isextracted from a text in a first language and a text in a secondlanguage translated from the text in the first language to form adictionary in which the words and the translations of the words arejuxtaposed (see patent document 2).

[0008] The third disclosed technique is a method of forming a dictionaryin which part of a translated sentence is expressed by a variable on thebasis of an example of translation and another example of translationformed by changing a word forming part of the first example of thetranslation (see patent document 3).

[0009] [Patent Document 1]

[0010] Published Unexamined Patent Application No. 6-28391

[0011] [Patent Document 2]

[0012] Published Unexamined Patent Application No. 9-128396

[0013] [Patent document 3]

[0014] Published Unexamined Patent Application No. 2002-297588

Problems to be Solved by the Invention

[0015] With the development of technologies and the globalization ofbusiness in recent years, many new words have been made almost every dayand part of them have rapidly become widespread. Under suchcircumstances, translation systems for translating documents have aproblem that if a user or any other operator or the like does not entertranslations of new words or phrases, the probability of a word or aphrase to be translated having been entered in a translation dictionaryis reduced, resulting in a reduction in translation accuracy.

[0016] Since each of the techniques described in the above-mentionedpatent documents 1 to 3 does not provide the function of enteringtranslations corresponding to new words or phrases in a dictionary andis therefore incapable of solving the above-described problem.

[0017] It is, therefore, an object of the present invention to provide atranslation system, a dictionary updating server and a translationmethod as a solution of the above-described problem and a program andrecording medium for use in the system, server and method. This objectcan be attained by a combination of features described in theindependent claims in the appended claims. In the dependent claims,further advantageous examples of the present invention are specified.

SUMMARY OF THE INVENTION

[0018] That is, according to a first form of the present invention,there are provided a translation system for translating a document, thetranslation system having a dictionary management unit for managing aplurality of categorized dictionaries classified according topredetermined categories, a phrase extraction unit for extracting a nounphrase from the document, a registration destination selection unit forselecting a category on which the extracted noun phrase should beregistered among a plurality of categories corresponding to theplurality of categorized dictionaries, respectively, a translation unitfor translating the noun phrase to generate a noun phrase translationwhich is a translation of the noun phrase, and a dictionary registrationunit for registering a pair of the noun phrase and the noun phrasetranslation on the categorized dictionary corresponding to the categoryselected by the registration destination selection unit, a dictionaryupdating server and a terminal constituting this translation system, atranslation method, and a program and a recording medium for use in thesystem, server, terminal and method.

[0019] In the above-described summary of the invention, not all thenecessary features of the present invention are listed. Subcombinationsof the features can constitute the present invention.

Preferred Embodiment

[0020] The present invention will be described with respect to anembodiment thereof. The embodiment described below, however, is notlimiting of the invention set forth in the appended claims, and allcombinations of features described in the description of the embodimentare not necessarily indispensable to the solution according to thepresent invention.

[0021]FIG. 1 shows the configuration of a translation system 10 whichrepresents an embodiment of the present invention. The translationsystem 10 of this embodiment extracts an unknown phrase duringtranslation processing and generates a translation of the extractedphrase by translation. The translation system 10 is provided with theobjective of limiting the reduction in translation accuracy accompanyingan increase in unknown words in such a manner that a phrase and atranslation of the phrase are entered as a pair in a translationdictionary to automatically enlarge a vocabulary.

[0022] The translation system 10 includes a translation front end system100 in which a document is translated and from which a translateddocument obtained as a result of translation is output, and a dictionaryupdating server 160 which updates the translation dictionary of thetranslation front end system 100 by generating a translation of a phraseextracted by the translation front end system 100.

[0023] The translation front end system 100 translates a documentdesignated by a user, an application program or the like. Thetranslation front end system 100 may be implemented on a terminal suchas a user's personal computer, PDA or portable telephone using theresults of translation. Alternatively, the translation front end system100 may be implemented on a server which is accessed through acommunication network by a user using a browser or the like. Also, thedictionary updating server 160 may be implemented on the server on whichthe translation front end system 100 is implemented.

[0024] The translation front end system 100 has a translation dictionaryrecording unit 110, a document translation unit 120 and an extractedphrase recording unit 125.

[0025] The translation dictionary recording unit 110 stores atranslation dictionary 117 used for translation by the translation frontend system 100. The translation dictionary 117 includes a plurality ofcategorized dictionaries 115 a and 115 b respectively corresponding to aplurality of categories. In the categorized dictionaries 115 b,translations of words, phrases and the like classified into categories,e.g., sports, home, business and science are registered. In thecategorized dictionary 115 a, words, phrases and the like not classifiedinto any of the plurality of categories corresponding to the othercategorized dictionaries 115, i.e., the plurality of categorizeddictionaries 115 b, and translations of them are registered. That is,words, phrases and the like not belonging to any of the plurality ofcategories corresponding to the plurality of categorized dictionaries115 b are registered in the categorized dictionary 115 a. At least oneof the categorized dictionaries 115 may be used with priority accordingto the category of a document to be translated. Further, each of thecategorized dictionaries 115 a and 115 b may function as a grammardictionary in which grammatical rules used for translation by thetranslation front end system 100 are stored.

[0026] The document translation unit 120 translates a document describedin a first language such as English into a translated document describedin a second language such as Japanese. In translation processing, thedocument translation unit 120 extracts an unknown phrase and outputs tothe extracted phrase recording unit 125 the extracted unknown phrase andthe category of the document. The document translation unit 120 mayselect the category of a document, for example, on the basis of thecontents of the document. Alternatively, the document translation unit120 may set a document category on the basis of a designation from auser.

[0027] The extracted phrase recording unit 125 stores a phrase extractedfrom a document by the document translation unit 120 by relating thephrase to the phrase appearance category. The phrase appearance categoryis the category of the document in which the phrase has appeared. Theextracted phrase recording unit 125 transmits stored pairs of phrasesand appearance categories to the dictionary updating server 160, forexample, periodically or according to predetermined timing.

[0028] The dictionary updating server 160 generates a translation of aphrase received from the extracted phrase recording unit 125 of thetranslation front end system 100 to update the translation dictionary ofthe translation front end system 100. The dictionary updating server 160may be implemented together with the translation front end system 100 ona terminal for a user who will use translation results. Alternatively,the dictionary updating server 160 may be implemented together with thetranslation front end system 100 on a server which is accessed through acommunication network by a user using a browser or the like, or may beimplemented on a server which communicates with a server on which thetranslation front end system 100 is implemented, through a communicationnetwork.

[0029] The dictionary updating server 160 has a phrase receiving unit127, a phrase classification unit 130, a registration phrase recordingunit 140, a translation dictionary recording unit 170, a phrasetranslation unit 180, an updating dictionary 185, and a dictionaryregistration unit 190.

[0030] The phrase receiving unit 127 receives from the extracted phraserecording unit 125 a phrase extracted from a document to be translated.The phrase classification unit 130 selects a phrase to be registered inthe translation dictionary 117 from phrases received from the extractedphrase recording unit 125 via the phrase receiving unit 127, and selectsa registration category in which the phrase should be registered. When aphrase is registered in the categorized dictionary 115 a, the phrase anda registration category (basic category) are stored in acategory-by-category registration phrase recording file 145 a in theregistration phrase recording unit 140. When a phrase is registered inthe categorized dictionary 115 b, the phrase and a registration categoryare stored in a category-by-category registration phrase recording file145 b in the registration phrase recording unit 140. The registrationphrase recording unit 140 supplies to the phrase translation unit 180data on phrases and registration categories for the phrases stored inthe category-by-category registration phrase recording files 145 a and145 b.

[0031] The translation dictionary recording unit 170 has the samefunction as the translation dictionary recording unit 110 and stores atranslation dictionary 177 used for translation phrases received fromthe translation front end system 100. Categorized dictionaries 175 a and175 b contained in the translation dictionary 177 may be updated insynchronization with updating of the categorized dictionaries 115 a and115 b to have the same contents as those in the categorized dictionaries115 a and 115 b. Alternatively, the translation front end system 100 orthe dictionary updating server 160 registers part of the contents of thecategorized dictionaries 175 a and 175 b in the categorized dictionaries115 a and 115 b. In a case where the translation front end system 100and the dictionary updating server 160 are provided on one terminal orone server for example, an arrangement may be adopted in which thetranslation dictionary recording unit 110 is directly connected to thephrase translation unit 180 instead of the translation dictionaryrecording unit 170, and the phrase translation unit 180 directly usesthe translation dictionary recording unit 110.

[0032] The phrase translation unit 180 is an example of the translationunit in accordance with the present invention. The phrase translationunit 180 generates phrase translations by translating phrases receivedfrom the translation front end system 100 to form an updating dictionaryused for updating of the translation dictionary 117 and the translationdictionary 177. In the updating dictionary 185, the updating dictionaryformed by the phrase translation unit 180 is stored. The dictionaryregistration unit 190 registers pairs of phrases and phrase translationsin the translation dictionary 117 and the translation dictionary 177 onthe basis of the updating dictionary in the updating dictionary 185. Thedictionary registration unit 190 registers a pair of a phrase and aphrase translation in the categorized dictionary 115 b and thecategorized dictionary 175 b corresponding to the registration categoryfor the phrase. If the registration category for the phrase is notclassified into any of the categories corresponding to the categorizeddictionaries 115 b, that is, if the registration category for the phraseis the basic category, the pair of the phrase and the phrase translationis registered in the base dictionary 115 a and the categorizeddictionary 175 a.

[0033] When the dictionary registration unit 190 registers a pair of aphrase and a phrase translation to the corresponding one of thecategorized dictionaries 115, it sends an instruction to the translationdictionary recording unit 110 make the same register the pair of thephrase and the phrase translation. According to the registrationinstruction from the dictionary registration unit 190, the translationdictionary recording unit 110 registers the pair of the phrase and thephrase translation in the categorized dictionary 115 corresponding tothe selected category.

[0034] In the above-described translation system 10, the dictionaryupdating server 160 can generate a translation of a phrase extractedfrom a document to be translated by the translation front end system100, and register the translation of the phrase in the categorizeddictionary 115 corresponding to the phrase. Thus, the vocabulary of thetranslation dictionary corresponding to the category of a translateddocument can be increased to limit the reduction in translation accuracyaccompanying an increase in unknown words.

[0035] The above-described translation front end system 100 anddictionary updating server 160 may be implemented by a combination ofcomponents different from that shown in FIG. 1. For example, the phraseclassification unit 130 and registration phrase recording unit 140 maybe implemented as components of the translation front end system 100instead of being implemented as components of the dictionary updatingserver 160.

[0036]FIG. 2 shows an example of the hierarchical structure of thetranslation dictionary 117 and the translation dictionary 177 stored inthe translation dictionary recording unit 110 and the translationdictionary recording unit 170 in this embodiment. A dictionary 900corresponding to the translation dictionary 117 and the translationdictionary 177 is placed in the highest position in the hierarchicalstructure. The dictionary 900 is divided into categorized dictionaries910 classified according to categories, and a base dictionary 905 inwhich words and phrases not classified into any of the plurality ofcategories corresponding to the categorized dictionaries 910. Thecategorized dictionaries 910 fall into a plurality of main categories915 such as “sports” and “home”. In correspondence with each of the maincategories 915, sub-categorized dictionaries 925 and a main categorizeddictionary 920 are provided. The sub-categorized dictionaries 925correspond to sub-categories which are categories further divided fromeach of the main categories 915. The main categorized dictionary 920 isa dictionary in which words, phrases or the like not belonging to any ofthe sub-categories in the main category 915 are registered.

[0037] Each of the categorized dictionaries 115 b and the categorizeddictionaries 175 b may correspond to sub-categorized dictionaries 925.In such a case, the categorized dictionary 115 a and the categorizeddictionary 175 a in which words and phrases not classified into any ofthe plurality of categories corresponding to the plurality ofcategorized dictionaries 115 b and the plurality of categorizeddictionaries 175 b are registered may correspond to the main categorizeddictionaries 920, or may alternatively correspond to the base dictionary905.

[0038] Each of the categorized dictionaries 115 b and the categorizeddictionaries 175 b may include the plurality of sub-categorizeddictionaries 925 and the main categorized dictionary 920 correspondingto one of the main categories 915. In such a case, the categorizeddictionary 115 a and the categorized dictionary 175 a may correspond tothe base dictionary 905.

[0039]FIG. 3 shows the configuration of the document translation unit120 in this embodiment. The document translation unit 120 has adictionary management unit 200, a morphological analysis unit 210, aphrase extraction unit 220, a syntactic analysis unit 230, a documenttranslation generation unit 240, and a document category selection unit250.

[0040] The dictionary management unit 200 manages the plurality ofcategorized dictionaries 115 recorded in the translation dictionaryrecording unit 110. The morphological analysis unit 210 performsmorphological analysis on each of sentences contained in a document. Thephrase extraction unit 220 extracts phrases from the document on thebasis of morphological analysis. The syntactic analysis unit 230analyzes the syntax of each sentence contained in the document on thebasis of the results of morphological analysis. The document translationgeneration unit 240 generates a translated document by translating thedocument on the basis of the morphological analysis results and thesyntactic analysis results by referring to the plurality of categorizeddictionaries 115 through the dictionary management unit 200. Thedocument category selection unit 250 selects the category of thedocument on the basis of the frequencies with which the plurality ofcategorized dictionaries 115 have been used by the document translationgeneration unit 240 in translation of the document.

[0041]FIG. 4 shows the flow of processing in the document translationunit 120 in this embodiment.

[0042] The morphological analysis unit 210 analyzes morphemes which areminimum units constituting each of sentences contained in a document andhaving meanings, and thereby recognizes words (S300). In thisprocessing, the morphological analysis unit 210 refers to grammaticalrules stored in the categorized dictionaries 115 a and 115 b andperforms morphological analysis on the basis of the grammatical rules.

[0043] Subsequently, the phrase extraction unit 220 extracts unknownphrases from the document on the basis of the results of morphologicalanalysis (S320). In this embodiment, the phrase extraction unit 220extracts unknown noun phrases not registered in the translationdictionary 117. Alternatively, the phrase extraction unit 220 mayextract various phrases including verb phrases. In S320, the phraseextraction unit 220 determines that a phrase is unknown in a case whereno translation of the phrase recognized on the basis of the results ofmorphological analysis is registered in the plurality of categorizeddictionaries 115.

[0044] Subsequently, the syntactic analysis unit 230 analyzes the syntaxof each sentence contained in the document on the basis of the resultsof morphological analysis (S330). The document translation generationunit 240 then performs translation processing by referring to theplurality of categorized dictionaries 115 through the dictionarymanagement unit 200 with respect to words and combinations of words suchas phrases and the like in the document recognized on the basis of theresults of morphological analysis and the results of syntactic analysisto generate translation words for word translations, phrase translationsand the like (S340).

[0045] Subsequently, the document category selection unit 250 selectsthe category of the document on the basis of the frequencies with whichthe plurality of categorized dictionaries 115 have been used by thedocument translation generation unit 240 in translation of the document(S350). For instance, the document category selection unit 250 dividesthe number of times one of the plurality of categorized dictionaries 115has been used by the numbers of times some of the plurality ofcategorized dictionaries 115 have been used, and obtains the result ofthis division as the frequency of occurrence of words, phrases and thelike in the corresponding category contained in the document. If words,phrases and the like in one of the categories occur frequently in thedocument in comparison with words, phrases and the like in the othercategories, the document category selection unit 250 selects thiscategory as the category of the document. For example, in processing forthis selection, if a category exists with which a frequency equal to orlarger than a predetermined threshold value is obtained as theabove-described frequency, the document category selection unit 250 mayselect this category as the category of the document.

[0046] The document category selection unit 250 then recognizes thisdocument category as the category in which a plurality of phrasesextracted from the document appears, and registers in the extractedphrase recording unit 125 the set of the phrases extracted from thedocument and this phrase appearance category (S355). In thisregistration, the document category selection unit 250 registers in theextracted phrase recording unit 125 the number of times each phraseappears in one of a plurality of documents to be translated as thefrequency of appearance of the phrase by relating the frequency to thephrase. The document translation generation unit 240 translates thedocument by using with priority the categorized dictionary 115corresponding to the category of the document (S360).

[0047] In a case where a plurality of documents to be translated exist,that is, for example, in a case where a user makes the translationsystem translate a plurality of documents one after another, thedocument translation unit 120 executes processing as steps S300 to S360with respect to the plurality of documents (S370). The morphologicalanalysis unit 210 performs morphological analysis on each of theplurality of documents, the phrase extraction unit 220 extracts one ormore phrases from each of the plurality of documents, and the syntacticanalysis unit 230 performs syntactic analysis on each of the pluralityof documents. The document translation generation unit 240 generates atranslation word or combination of translation words for each of wordsor combinations of words in the plurality of documents. The documentcategory selection unit 250 selects the category of each of theplurality of document on the basis of the frequencies of use of theplurality of categorized dictionaries 115.

[0048] In the document translation unit 120, the syntactic analysis unit230 can recognize a phrase without analyzing the construction of wordsforming the phrase, since the dictionary updating server 160 registersnew phrases and phrase translations one after another in the pluralityof categorized dictionaries 115. Consequently, the accuracy of syntacticanalysis and the speed of grammatical analysis in the documenttranslation unit 120 can be increased.

[0049]FIG. 5 shows the configuration of the phrase classification unit130 in this embodiment. The phrase classification unit 130 has aregistration phrase selection unit 400 and a registration destinationselection unit 410.

[0050] The registration phrase selection unit 400 makes a selection asto whether or not each of phrases should be registered in thetranslation dictionary on the basis of the frequency with which thephrase appears in one or a plurality of documents. The registrationdestination selection unit 410 selects, with respect to each of thephrases extracted by the phrase extraction unit 220 and selected by theregistration phrase selection unit 400 as phrases to be registered, oneof the plurality of categories respectively corresponding to theplurality of categorized dictionaries 115 in which the phrase should beregistered. The registration destination selection unit 410 includes acategory-by-category-basis appearance frequency computation unit 420 anda registration destination category selection unit 430.

[0051] The category-by-category-basis appearance frequency computationunit 420 computes the frequency of appearance of a phrase with respectto each of the plurality of categories on the basis of the frequency ofappearance of the phrase in one or the plurality of documents to betranslated and the categories of the documents. The registrationdestination category selection unit 430 makes a selection as to in whichone of the plurality of categorized dictionaries 115 each phrase shouldbe registered, on the basis of the frequencies of appearance of thephrase in the plurality of categories.

[0052]FIG. 6 shows the flow of processing in the phrase classificationunit 130 in this embodiment.

[0053] First, the registration phrase selection unit 400 rearranges oneor more phrases received from the extracted phrase recording unit 125according to the frequencies of appearance with respect to thecategories (S500). Subsequently, if the frequency with which one of thephrases appears in one or a plurality of documents to be translated islower than a predetermined lower limit value, the registration phraseselection unit 400 selects inhibiting the pair of the phrase and atranslation of the phrase from being registered in any one of theplurality of categorized dictionaries 115 (S505). More specifically, theregistration phrase selection unit 400 supplies the registrationdestination selection unit 410 with information about one or morephrases received from the extracted phrase recording unit 125 afterremoving from this information the information about the phrase selectedas one not to be registered in any one of the categorized dictionaries115.

[0054] Subsequently, the category-by-category-basis appearance frequencycomputation unit 420 computes the frequency of appearance of the phrasewith respect to each of the plurality of categories on the basis of thefrequency of appearance of the phrase in one or the plurality ofdocuments to be translated (S515).

[0055] Subsequently, the registration destination category selectionunit 430 makes a selection as to in which one of the categorizeddictionary 115 a and the plurality of categorized dictionaries 115 beach phrase should be registered, on the basis of the frequencies ofappearance of the phrase in the plurality of categories. Morespecifically, if the phrase appears frequently in one particularcategory (S520), this particular category is selected as a category inwhich the phrase should be registered and is stored in thecategory-by-category registration phrase recording file 145 b by beingrelated to this particular category (S530). If the phrase does notappear particularly frequently in any one of the categories (S520),registering the pair of the phrase and the translation of the phrase inthe categorized dictionary 115 a provided as the base dictionary isselected to store the phrase in the category-by-category registrationphrase recording file 145 a by relating the phrase to the basic category(S535). The phrase classification unit 130 performs the processing shownas the above-described steps S505 to S535 for all the phrases receivedfrom the extracted phrase recording unit 125 (S540).

[0056] By the above-described processing, the phrase classification unit130 selects, with respect to one or a plurality of documents, thecategory in which the phrase extracted from one or of the plurality ofdocuments should be registered, on the basis of the frequency ofappearance of the phrase.

[0057] For example, in a case where a phrase A appears with appearancefrequencies d1, d2, and d3 in a document D1 in a category C1 anddocuments D2 and D3 in a category C2, the phrase classification unit 130may select the category in which the phrase should be registered, by amethod described below by way of example. If the appearance frequency(d1+d2+d3) of the phrase A does not satisfy the condition forregistration of the phrase A, the registration phrase selection unit 400selects inhibiting registration of the phrase in any one of theplurality of categorized dictionaries 115. In the case of registeringthe phrase A in one of the categorized dictionaries 115, thecategory-by-category-basis appearance frequency computation unit 420computes the appearance frequency d1 in the category C1 and theappearance frequency (d2+d3) in the category C2 of the phrase A. Theregistration destination category selection unit 430 makes adetermination as to in which one of the categories the phrase A appearsparticularly frequently on the basis of the appearance frequency d1 andthe appearance frequency (d2+d3) to make a selection as to in which oneof the categorized dictionaries 115 the phrase A should be registered.

[0058] The above-described phrase classification unit 130 selectsregistration of a phrase in the translation dictionary 117 if the phraseappears with a frequency higher than the lower limit in one or aplurality of documents. In the translation system 10, therefore, aphrase which appears with such a low frequency that the phrase cannot beclassified with sufficiently high accuracy with respect to thecategories is not registered in the translation dictionary 117, thuspreventing a reduction in translation accuracy. While the phraseclassification unit 130 selects registering a phrase appearingfrequently in one particular category in the categorized dictionary 115b corresponding to the particular category, it also selects a phrase notappearing particular frequently in any category in the base dictionary115 a. In the translation system 10, therefore, a phrase can beregistered in a suitable one of the categorized dictionaries 115according to the category in which the phrase appears, thereby suitablyincreasing the vocabulary of the translation dictionary 117 so that theaccuracy of translation results is improved.

[0059] A more concrete example of processing in the phraseclassification unit 130 will be described.

[0060] First, the registration phrase selection unit 400 generates thefollowing matrix (expression (1)) expressing the frequencies (thenumbers of times) with which phrase t_(i) appears in appearance categoryd_(j) on the basis of phrases and phrase appearance categories receivedfrom the extracted phrase recording unit 125.

[0061] [Expression 1] $\begin{matrix}{A = {\begin{matrix}t_{1} \\t_{2} \\t_{3}\end{matrix}\overset{\begin{matrix}d_{1} & d_{2} & d_{3} & d_{4}\end{matrix}}{\begin{bmatrix}10 & 0 & 0 & 1 \\1 & 12 & 0 & 1 \\3 & 5 & 3 & 2\end{bmatrix}}}} & (1)\end{matrix}$

[0062] Consider each phrase t_(i) as a vector, where each element of thevector represents the phrase frequency for each category. Then, thedegree of appearance of the phrase t_(i) in the appearance categoryd_(j) can be expressed, for example, by the degree of similarity of thephrase t_(i) to the appearance category d_(j) as shown by the followingexpression (2).

[0063] [Expression 2] $\begin{matrix}{{{sim}\left( {\overset{->}{t_{i}},\overset{->}{e_{j}}} \right)} = \frac{\overset{->}{t_{i}} \cdot \overset{->}{e_{j}}}{{\overset{->}{t_{i}}} \cdot {\overset{->}{e_{j}}}}} & (2)\end{matrix}$

[0064] The category-by-category-basis appearance frequency computationunit 420 computes, as the appearance frequency of the phrase t_(i) withrespect to the appearance category d_(j), an appearance frequencynormalized by using the maximum frequency, as shown by the followingexpression (3) of t_(f(i,j)) for example.

[0065] [Expression 3] $\begin{matrix}{{tf}_{({i,j})} = {K + {\left( {1 - K} \right)\frac{A_{({i,j})}}{\max_{i,j}\left( A_{({i,j})} \right)}}}} & (3)\end{matrix}$

[0066] In expression (3), K is a constant by which the influence of theappearance frequency on the determination ofregistration/non-registration of the phrase is determined.

[0067] The registration destination category selection unit 430 makes aselection as to whether or not the phrase t_(i) should be registered inthe appearance category d_(j) on the basis of the degree of appearanceof the phrase t_(i) in the appearance category d_(j) and/or thefrequency of appearance of the phrase t_(i) in the appearance categoryd_(j). At the time of selection as to whether or not the phrase t_(i)should be registered in the appearance category d_(j) on the basis ofthe degree of appearance of the phrase t_(i) in the appearance categoryd_(j) and the frequency of appearance of the phrase t_(i) in theappearance category d_(j), the registration destination categoryselection unit 430 may determine whether or not the phrase t_(i) shouldbe registered in the appearance category d_(j) on the basis of theproduct of the degree of similarity shown by expression (2) and theappearance frequency shown by expression (3).

[0068] The phrase classification unit 130 performs the above-describedprocessing with respect to a plurality of sub-categories to registerphrases appearing particularly frequently in one of the sub-categoriesin the translation dictionary. 117 and the translation dictionary 177,the phrases being registered in decreasing order of appearancefrequency. After removing the phrase registered in one of thesub-categories by this processing, the phrase classification unit 130again performs the above-described processing with respect to theplurality of main categories to register in the translation dictionary117 and the translation dictionary 177 phrases not appearingparticularly frequently in any one of the sub-categories but appearingfrequently in one of the main categories, the phrases being registeredin decreasing order of appearance frequency.

[0069] This embodiment may alternatively be such that in theabove-described steps S520, S530 and S535 the registration destinationcategory selection unit 430 selects a particular one of the categoriesas a category in which the phrase should be registered if the frequencyof appearance of the phrase in the particular category is equal to orlarger than a predetermined value, and selects registering the pair ofthe phrase and a translation of the phrase in the base dictionary, i.e.,the categorized dictionary 115 a, if the frequency of appearance of thephrase in the particular category is lower than the predetermined value.

[0070]FIG. 7 shows the configuration of the phrase translation unit 180in this embodiment. The phrase translation unit 180 includes a prioritysetting unit 605, a translation word generation unit 600, a page searchunit 610, a morphological analysis unit 613, a syntactic analysis unit616, and a phrase translation generation unit 620.

[0071] The priority setting unit 605 selects, for each of phrases storedin the category-by-category registration phrase recording files 145 aand 145 b, one of the categorized dictionaries 175 to be used withpriority for translation of the phrase. The translation word generationunit 600 translates each of the phrases stored in thecategory-by-category registration phrase recording files 145 a and 145 bto generate a phrase translation candidate which is a candidate for aphrase translation. The page search unit 610 searches pages on a networkto find pages containing phrase translation candidates corresponding tothe phrases. The morphological analysis unit 613 has the sameconfiguration and function as those of the morphological analysis unit210, and performs morphological analysis on each phrase to be analyzed.The syntactic analysis unit 616 has the same configuration and functionas those of the syntactic analysis unit 230, and performs syntacticanalysis on each phrase to be analyzed. The phrase translationgeneration unit 620 generates a phrase translation of each phrase on thebasis of the results of morphological analysis and syntactic analysis orthe result of page search performed by the page search unit 610.

[0072]FIG. 8 shows the flow of processing in the phrase translation unit180 in this embodiment. The priority setting unit 605 first obtains insequence the phrases stored in the category-by-category registrationphrase recording files 145 a and 145 b in the registration phraserecording unit 140, which phrases are to be registered in thetranslation dictionary 117. If the obtained phrase is a phrase stored inone of the category-by-category registration phrase recording files 145b (S700), the priority setting unit 605 increases the priority for thecategorized dictionary 175 b corresponding to the registration categorywhich has been selected by the registration destination selection unit430 and stored in the category-by-category registration phrase recordingfile 145 by being related to the phrase, and in which the phrase is tobe registered, in comparison with the priorities for the othercategorized dictionaries (S710). The priority setting unit 605 therebydetermines prioritized use of the categorized dictionary 175 b (S710).If the obtained phrase is a phrase stored in the category-by-categoryregistration phrase recording file 145 a (S700), the priority settingunit 605 determines equally-prioritized use of all the categorizeddictionaries 175 b.

[0073] Subsequently, the morphological analysis unit 613, the syntacticanalysis unit 616, and the phrase translation generation unit 620translate the translation-target phrase to generate a phrase translationas a translation of the phrase (S720). That is, the morphologicalanalysis unit 613 performs morphological analysis on the analysis-objectphrase by referring to the categorized dictionaries 175 a and 175 b. Thesyntactic analysis unit 616 then performs syntactic analysis onanalysis-object phrase on the basis of the results of morphologicalanalysis. The phrase translation generation unit 620 generates a phrasetranslation by translating the translation-target phrase by referring tothe categorized dictionaries 175 a and 175 b with respect to each of thewords, etc., in the document recognized on the basis of the results ofmorphological analysis and structural composition analysis. Ifprioritized use of the categorized dictionary 175 b is determined instep S710, the phrase translation generation unit 620 translates thephrase by using the categorized dictionary 175 b with priority togenerate a phrase translation.

[0074] Subsequently, the phrase translation generation unit 620generates, on the basis of the translation-target phrase and the phrasetranslation of the phrase, an updating dictionary used for updating ofthe translation dictionary 117 (S730). In the updating dictionarygenerated by the phrase translation generation unit 620, identificationinformation for identifying the registration category in which thephrase and the phrase translation of the phrase are to be registered orthe categorized dictionary 115 in which the phrase is to be registeredis held by being related to the phrase.

[0075] The phrase translation unit 180 performs processing from S700 toS730 with respect to the phrases which are stored in thecategory-by-cateqory registration phrase recording files 145 a and 145b, and which are to be registered in the translation dictionary 117(S740).

[0076] In the above-described phrase translation unit 180, when thetranslation word generation unit 600 and the phrase translationgeneration unit 620 generate a phrase translation of a phrase to beregistered, they use the categorized dictionary 175 corresponding to thecategory in which the phrase should be registered, and can generate thephrase translation on the basis of prioritized use of translations ofwords and phrases used in the category in which the phrase to beregistered, thus improving the phrase translation accuracy.

[0077]FIG. 9 shows the flow of network-mediated phrase translationgeneration processing in the phrase translation unit 180 in thisembodiment. The phrase translation unit 180 performs processing shown inFIG. 9 in the step S720 shown in FIG. 8 in the case of generating aphrase translation by using pages on a network such as the Internet.

[0078] The translation word generation unit 600 first translates atranslation-target phrase and generates one or more phrase translationcandidates as candidates for a phrase translation (S800). The pagesearch unit 610 then searches pages on the network to find pagescontaining the phrase translation candidates (S810). The phrasetranslation generation unit 620 makes a selection as to whether or notone of the phrase translation candidates should be selected as a phrasetranslation on the basis of whether or not any page containing thephrase translation candidate has been hit (S820).

[0079] For instance, in a case where the translation-target phrase is“enterprise software”, the translation word generation unit 600generates “

”, “

” and “

” as phrase translation candidates. Subsequently, the page search unit610 performs a search to find pages containing the phrase translationcandidate “

”, pages containing the phrase translation candidate “

” and pages containing the phrase translation candidate “

”. If some pages containing “

” are hit while no page containing “

” or “

” is hit, the phrase translation generation unit 620 selects “

” as a phrase translation.

[0080] If pages containing some of the plurality of phrase translationcandidates are hit, the phrase translation generation unit 620 mayselect the phrase translation candidate corresponding to the largestnumber of hit pages. Alternatively, the phrase translation generationunit 620 may select the phrase translation candidate most frequently hiton pages on the network.

[0081] The phrase translation unit 180 may perform the above-describedprocessing by a method described below.

[0082] First, the translation word generation unit 600 translates eachof words contained in a translation-target phrase and generates one ormore translation words corresponding to the word in the phrase byreferring to the categorized dictionaries 175 a and 175 b (S800). Thepage search unit 610 then searches pages on the network to find pagescontaining at least one word in each of the groups of translation wordscorresponding to the translation-target words, and makes this searchwith respect to all the words contained in the translation-target phrase(S810). The phrase translation generation unit 620 generates a phrasetranslation on the basis of words and phrases on the searched pagescontaining at least one word in each of the groups of translation wordscorresponding to all the words contained in the phrase (S820).

[0083] For instance, in a case where the translation-target phrase is“enterprise software”, the translation word generation unit 600translates “enterprise” and “software” contained in the phrase togenerate translations for “enterprise”: {

,

,

} and a translation word {

} for “software”. Subsequently, the page search unit 610 searches pageson the network to find pages each containing at least one word in eachof the groups of translation words corresponding to the words containedin the translation-target phrase, i.e., a page A containing “

” and “

”, a page B containing “

” and “

” and a page C containing “

” and “

”. The phrase translation generation unit 620 generates a phrasetranslation on the basis of word and phrases on the pages A to Ccontaining at least one word in each of the groups of translation wordsfor all the words contained in the phrase. In this instance, if “

” and “

” are not described in any portion of the pages A and B, and if the pageC has a portion where “

” is described, the portion “

” on the page C containing at least one word in each of the groupstranslation words for all the translation-target words is selected as aphrase translation of “enterprise software”.

[0084] In the above-described processing, the phrase translationgeneration unit 620 may generate a phrase translation on the basis ofthe numbers of hit pages in search results. That is, in theabove-described instance, the phrase translation generation unit 620 maygenerate a phrase translation by selecting the words corresponding topages having a number of hits which is the largest of the number of hitsof pages containing “

” and “

”, the numbers of hit of pages containing “

” and “

” and the numbers of hits of pages containing “

” and “

”.

[0085] FIGS. 10(a) and 10(b) show an example of the results oftranslation by the document translation unit 120 and the phrasetranslation unit 180 in this embodiment in a case where aregistration-object phrase is a noun phrase “Visitor reviews”.

[0086]FIG. 10(a) shows the result of translation in a case where thedocument translation unit 120 performs sentence-prioritized translationwhen translating a portion of a document other than a noun phrase.

[0087] The morphological analysis unit 210 first performs morphologicalanalysis on a translation-target noun phrase and analyzes words in thephrase as parts of speech or the like. The syntactic analysis unit 230then performs syntactic analysis on the basis of grammatical rulesregistered in the categorized dictionaries 175 a and 175 b.

[0088] In syntactic analysis, the syntactic analysis unit 230 assigns toeach English word a cost indicating the degree of lowness of thefrequency of use of the part of speech of the English word. For example,the cost at which the English word, “Visitor” is used as a noun is 5, asshown in parentheses in the figure.

[0089] Subsequently, the syntactic analysis unit 230 generates a phraseby using a combination described in the grammatical rules registered inthe categorized dictionaries 175 a and 175 b and assigns a cost to thephrase. In this example, the cost of use as noun+noun is 80, the cost ofuse of a single noun as a noun phrase is 18, and the cost of use of asingle verb as a verb is 15.

[0090] The syntactic analysis unit 230 generates a complete sentence bycombining the phrases and assigns a cost to the complete sentence. Inthis example, the cost of construction of noun phrase+verb phrase is 18,and each of the cost of a complete sentence 990 a formed by a singlenoun phrase and the cost of a complete sentence 990 b formed by nounphrase+verb phrase is 200.

[0091] The syntactic analysis unit 230 computes the sum of the costswith respect to the complete sentences 990 a and 990 b analyzed asdescribed above. For example, the sum of the costs of the completesentence 990 a is “noun (5)+noun (5)+noun phrase (80)+complete sentence(200)=290”. On the other hand, the sum of the costs of the completesentence 990 b is 261.

[0092] As a result of the above-described syntactic analysis, thesyntactic analysis unit 230 outputs a grammar having the smallest valueas the sum of costs, i.e., a grammar by which “Visitor reviews” istranslated into the complete sentence 990 b, as a syntactic analysisresult. According to this grammar, the document translation generationunit 240 outputs a translation result “

”.

[0093]FIG. 10(b) shows the result of translation in a case where thephrase translation unit 180 performs noun phrase-prioritizedtranslation. In the case of generation of a noun phrase translation, thephrase translation unit 180 assigns a higher priority to use of agrammatical rule for a translation result as a noun phrase in comparisonwith translation of a portion of a document other than a noun phrase bythe document translation unit 120. That is, as shown in FIG. 10(b), thecost of the complete sentence formed only of the noun phrase shown inFIG. 10(a) is determined by subtracting a predetermined value, e.g., 150from the cost of the complete sentence 990 b. The syntactic analysisunit 616 outputs a grammar by which “Visitor reviews” is translated intothe complete sentence 990 a as a result of syntactic analysis of“Visitor reviews”. According to this grammar, the phrase translationgeneration unit 620 outputs a translation result “

”.

[0094] As described above, the phrase translation unit 180 prioritizes agrammatical rule for a noun phrase-prioritized translation in the caseof generating a noun phrase translation in comparison with translationof a portion other than the noun phrase. More specifically, the phrasetranslation unit 180 assigns a higher priority to a grammatical rule fora noun phrase-prioritized translation in the case of translating a nounphrase to be registered, in comparison with a grammatical rule fortranslation into a sentence formed of a combination of a noun and averb.

[0095] The phrase translation unit 180 may register in at least one ofthe categorized dictionaries 175 a and 175 b a noun-phrase grammaticalrule which is provided as a method for noun phrase-prioritizedtranslation and used by the phrase translation unit 180 in translationof a noun phrase.

[0096] The above-described phrase translation unit 180 sets a higherpriority for use of a grammatical rule for a translation result as anoun phrase in the case of generating a noun phrase translation of anoun phrase extracted from a translation-target document, in comparisonwith translation of a portion other than the noun phrase in thedocument. In this manner, the phrase translation unit 180 can performtranslation suitable for extracted noun phrases, such that the accuracyof translation is improved.

[0097]FIG. 11 shows an example of a hardware configuration of a computer1000 in this embodiment. The translation front end system 100 and/or thedictionary updating server 160 of this embodiment are implemented byusing the computer 1000. The computer 1000 has a CPU 1100, CPUperipheral components, i.e., a RAM 1120, a graphic controller 1175 and adisplay device 1180, which are connected to each other by a hostcontroller 1182. The computer 1000 also has a communication interface1130, a hard disk drive 1140, and an input/output unit having a CD-ROMdrive 1160. These components are connected to the host controller 1182by an input/output controller 1184. The computer 1000 further has a ROM1110 and a legacy input/output unit having a flexible disk drive 1150and an input/output chip 1170. These components are connected to theinput/output controller 1184.

[0098] The host controller 1182 connects the RAM 1120 to the CPU 1100and the graphic controller 1175, which access the RAM 1120 at a hightransfer rate. The CPU 1100 operates on the basis of programs stored inthe ROM 1110 and the RAM 1120 and controls each component. The graphiccontroller 1175 obtains image data formed on a frame buffer provided inthe RAM 1120 by the CPU 1100 or the like, and displays the image data onthe display device 1180. Alternatively, the graphic controller 1175 mayincorporate a frame buffer for storing image data formed by the CPU 1100or the like.

[0099] The input/output controller 1184 connects the communicationinterface 1130, which is an input/output device of a comparatively highspeed, the hard disk drive 1140 and the CD-ROM drive 1160 to the hostcontroller 1182. The communication interface 1130 performs communicationwith other units via a network. The hard disk drive 1140 stores programsand data used by the computer 1000. The CD-ROM drive 1160 reads out aprogram or data from a CD-ROM 1195 and provides the read program or datato the RAM 1120 and/or the hard disk drive 1140.

[0100] To the input/output controller 1184, the ROM 1110 andinput/output devices of a comparatively low speed such as the flexibledisk drive 1150 and the input/output chip 1170 are connected. The ROM1110 stores a boot program which is executed at the time of startup ofthe computer 1000, a program dependent on the hardware of the computer1000, etc. The flexible disk drive 1150 reads a program or data from aflexible disk 1190 and provides the read program or data to the CPU 1100and/or the hard disk drive 1140 via the input/output controller 1184. Tothe input/output chip 1170, the flexible disk 1190 and various,input/output devices are connected, for example, through a parallelport, a serial port, a keyboard port, a mouse port, and the like.

[0101] A program to be provided to the CPU 1100 via the RAM 1120 isprovided by a user in a state of being stored on a recording medium suchas the flexible disk 1190, the CD-ROM 1195 or an IC card. The program isread out from the recording medium, is installed in the computer 1000via the input/output controller 1184 and the RAM 1120, and is executedby the CPU 1100.

[0102] A program installed in and executed by the computer 1000 toenable the computer 1000 to operate as the translation front end system100 has document translation modules including a dictionary managementmodule, a morphological analysis module, a phrase extraction module, asyntactic analysis module, a document translation generation module, anda document category selection module. This program or these modulesenable the computer 1000 to function as a document translation unit 120including the dictionary management unit 200, the morphological analysisunit 210, the phrase extraction unit 220, the syntactic analysis unit230, the document translation generation unit 240, and the documentcategory selection unit 250. The translation dictionary recording unit110 and the extracted phrase recording unit 125 may be implemented asthe hard disk drive 1140 or a recording medium on a server connected toa network.

[0103] A program installed in and executed by the computer 1000 toenable the computer 1000 to operate as the dictionary updating server160 has a registration phrase selection module, a registrationdestination selection module including a category-by-category-basisappearance frequency computation module and a registration destinationcategory selection module, a phrase translation module including atranslation word generation module, a page search module, amorphological analysis module, a syntactic analysis module and a phrasetranslation generation module, and a dictionary registration module.This program or these modules enable the computer 1000 to operate as theregistration phrase selection unit 400, the registration destinationselection unit 410 including the category-by-category-basis appearancefrequency computation unit 420 and the registration destination categoryselection unit 430, the phrase translation unit 180 including thetranslation word generation unit 600, the page search unit 610, themorphological analysis unit 613, the syntactic analysis unit 616 and thephrase translation generation unit 620, and the dictionary registrationunit 190. The registration phrase recording unit 140, the translationdictionary recording unit 170 and the updating dictionary 185 may beimplemented as the hard disk drive 1140 or a recording medium on aserver connected to a network.

[0104] The above-described programs or modules may be stored on anexternal storage medium. As this storage medium, an optical recordingmedium such as a DVD or a PD, a magneto-optical recording medium such asan MD, a tape medium or a semiconductor memory such as an IC card may beused as well as the flexible disk 1190 and the CD-ROM. Also, a storagedevice such as a hard disk or a RAM provided in a server systemconnected to a special-purpose communication network or the Internet maybe used as a recording medium to provide the programs to the computer1000 via the network.

[0105] While the present invention has been described with respect tothe embodiment thereof, the technical scope of the present invention isnot limited to the scope described in the above description of theembodiment. Various modifications and changes may be made in theabove-described embodiment. From the description in the appended claims,it is apparent that a form of the present invention including suchmodifications and changes is also included in the technical scope of thepresent invention.

[0106] According to the above-described embodiment, a translationsystem, a dictionary updating server, a translation method and a programand a recording medium in the system, server and method shown in itemsbelow can be implemented.

[0107] (Item 1)

[0108] A translation system for translating a document, having adictionary management unit for managing a plurality of categorizeddictionaries classified according to predetermined categories, a phraseextraction unit for extracting a noun phrase from the document, aregistration destination selection unit for selecting a category onwhich the extracted noun phrase should be registered among a pluralityof categories corresponding to the plurality of categorizeddictionaries, respectively, a translation unit for translating the nounphrase to generate a noun phrase translation which is a translation ofthe noun phrase, and a dictionary registration unit for registering apair of the noun phrase and the noun phrase translation on thecategorized dictionary corresponding to the category selected by theregistration destination selection unit.

[0109] (Item 2)

[0110] The translation system according to Item 1, further having adocument category selection unit for selecting the category of thedocument on the basis of the frequencies of use of the plurality ofcategorized dictionaries in translation of the document, wherein theregistration destination selection unit selects a category on which theextracted noun phrase should be registered on the basis of the categoryselected by the document category selection unit.

[0111] (Item 3)

[0112] The translation system according to Item 2, wherein the documentcategory selection unit selects the category of each of a plurality ofdocuments on the basis of the frequencies of use of the plurality ofcategorized dictionaries in translation of the plurality of documents;the phrase extraction unit extracts the noun phrase from the pluralityof documents; and the registration destination selection unit selects acategory on which the noun phrase should be registered, on the basis ofthe frequencies of appearance of the noun phrase in the plurality ofdocuments and the categories of the documents.

[0113] (Item 4)

[0114] The translation system according to Item 3, further having aregistration phrase selection unit for selecting inhibiting the pair ofthe noun phrase and the noun phrase translation from being registered inany one of the plurality of categorized dictionaries if the frequencywith which the noun phrase appears in the plurality of documents islower than a predetermined lower limit value.

[0115] (Item 5)

[0116] The translation system according to Item 2, wherein one of theplurality of categorized dictionaries is a base dictionary in whichwords and phrases not classified into any one of the plurality ofcategories corresponding to the plurality of categorized dictionariesare registered, and the registration destination selection unit has acategory-by-category-basis appearance frequency computation unit forcomputing the frequency of appearance of the noun phrase with respect toeach of the plurality of categories on the basis of the frequencies ofappearance of the noun phrase in the plurality of documents and thecategories of the documents, and a registration destination categoryselection unit for making a selection as to in which one of theplurality of categorized dictionaries the pair of the noun phrase andthe noun phrase translation should be registered, on the basis of thefrequencies of appearance of the noun phrase with respect to theplurality of categories, wherein the dictionary registration unitregisters the pair of the noun phrase and the noun phrase translation inthe base dictionary when the registration destination category selectionunit selects registration of the noun phrase in the base dictionary.

[0117] (Item 6)

[0118] The translation system according to Item 2, wherein theregistration destination selection unit selects a category on which thenoun phrase should be registered on the basis of the degrees ofappearance of the noun phrase with respect to the plurality ofcategories corresponding to the plurality of documents.

[0119] (Item 7)

[0120] The translation system according to Item 1, wherein thetranslation unit translates the noun phrase to generate the noun phrasetranslation on the basis of prioritized use of the categorizeddictionary corresponding to the category which is selected by theregistration destination selection unit and on which the noun phraseshould be registered.

[0121] (Item 8)

[0122] The translation system according to Item 1, wherein thetranslation unit sets a higher priority for use of a grammatical rulefor a translation result as a noun phrase in the case of generating thenoun phrase translation, in comparison with translation of a portionother than the noun phrase in the document.

[0123] (Item 9)

[0124] The translation system according to Item 1, wherein thetranslation unit has a translation word generation unit for generating anoun phrase translation candidate as a candidate for the noun phrasetranslation, a page search unit for searching pages on a network to findpages containing the noun phrase translation candidate, and a nounphrase translation generation unit which makes a selection as to whetheror not the noun phrase translation candidate should be selected as thenoun phrase translation on the basis of whether or not any pagecontaining the noun phrase translation candidate has been hit.

[0125] (Item 10)

[0126] A dictionary updating server for updating dictionaries for use intranslating a document at a terminal managing a plurality of categorizeddictionaries classified according to predetermined categories, theserver having a noun phrase receiving unit for receiving a noun phraseextracted from the document from the terminal, a registrationdestination selection unit for selecting a category on which theextracted noun phrase should be registered among a plurality ofcategories corresponding to the plurality of categorized dictionaries,respectively, a translation unit for translating the noun phrasereceived from the terminal to generate a noun phrase translation whichis a translation of the noun phrase, and a dictionary registration unitfor registering a pair of the noun phrase and the noun phrasetranslation on the categorized dictionary corresponding to the categoryselected by the registration destination selection unit.

[0127] (Item 11)

[0128] A translation system for translating a document, having aterminal for updating dictionaries for use in the translation on thebasis of an instruction from an external dictionary updating server, theterminal having a translation dictionary recording unit for storing aplurality of categorized dictionaries classified according topredetermined categories, a phrase extraction unit for extracting a nounphrase from the document, and a document translation unit fortranslating the document by using the plurality of categorizeddictionaries, the dictionary updating server having a registrationdestination selection unit for selecting a category on which theextracted noun phrase should be registered among a plurality ofcategories corresponding to the plurality of categorized dictionaries,respectively, a translation unit for translating the noun phrase togenerate a noun phrase translation which is a translation of the nounphrase, and a dictionary registration unit for issuing an instruction toregister a pair of the noun phrase and the noun phrase translation tothe categorized dictionary corresponding to the category selected by theregistration destination selection unit, wherein the translationdictionary recording unit registers the pair of the noun phrase and thenoun phrase translation on the categorized dictionary corresponding tothe category selected by the registration destination selection unit onthe basis of the registration instruction issued by the dictionaryregistration unit.

[0129] (Item 12)

[0130] A program product for a translation system for translating adocument, the program product containing program which enables thetranslation system to function as a dictionary management unit formanaging a plurality of categorized dictionaries classified according topredetermined categories, a phrase extraction unit for extracting a nounphrase from the document, a registration destination selection unit forselecting a category on which the extracted noun phrase should beregistered among a plurality of categories corresponding to theplurality of categorized dictionaries, respectively, a translation unitfor translating the noun phrase to generate a noun phrase translationwhich is a translation of the noun phrase, and a dictionary registrationunit for registering a pair of the noun phrase and the noun phrasetranslation on the categorized dictionary corresponding to the categoryselected by the registration destination selection unit.

[0131] (Item 13)

[0132] A program product for a dictionary updating server for updatingdictionaries for use in translating a document at a terminal managing aplurality of categorized dictionaries classified according topredetermined categories, the program product containing program whichenables the dictionary updating server to function as a noun phrasereceiving unit for receiving a noun phrase extracted from the documentfrom the terminal, a registration destination selection unit forselecting a category on which the extracted noun phrase should beregistered among a plurality of categories corresponding to theplurality of categorized dictionaries, respectively, a translation unitfor translating the noun phrase received from the terminal to generate anoun phrase translation which is a translation of the noun phrase, and adictionary registration unit for registering a pair of the noun phraseand the noun phrase translation on the categorized dictionarycorresponding to the category selected by the registration destinationselection unit.

[0133] (Item 14)

[0134] A translation method in a translation system for translating adocument by using a computer, comprising the steps of dictionarymanagement with the computer to manage a plurality of categorizeddictionaries classified according to predetermined categories, phraseextraction with the computer to extract a noun phrase from the document,registration destination selection with the computer to select acategory on which the extracted noun phrase should be registered among aplurality of categories corresponding to the plurality of categorizeddictionaries, respectively, translation with the computer to translatethe noun phrase to generate a noun phrase translation which is atranslation of the noun phrase, and dictionary registration with thecomputer to register a pair of the noun phrase and the noun phrasetranslation on the categorized dictionary corresponding to the categoryselected in the registration destination selection step.

Advantages of the Invention

[0135] According to the present invention, as is apparatus from theabove description, translations of phrases extracted from a document tobe translated are generated and registered in a translation dictionaryto prevent a reduction in translation accuracy due to addition of words,phrases and the like newly created.

3. BRIEF DESCRIPTION OF THE DRAWINGS

[0136]FIG. 1 shows the configuration of a translation system 10 in anembodiment of the present invention;

[0137]FIG. 2 shows an example of the hierarchical structure of atranslation dictionary 117 and a .translation dictionary 177 stored in atranslation dictionary recording unit 110 and a translation dictionaryrecording unit 170 in the embodiment of the present invention;

[0138]FIG. 3 shows the configuration of a document translation unit 120in the embodiment of the present invention;

[0139]FIG. 4 shows the flow of processing in the document translationunit 120 in the embodiment of the present invention;

[0140]FIG. 5 shows the configuration of a phrase classification unit 130in the embodiment of the present invention;

[0141]FIG. 6 shows the flow of processing in the phrase classificationunit 130 in the embodiment of the present invention;

[0142]FIG. 7 shows the configuration of a phrase translation unit 180 inthe embodiment of the present invention;

[0143]FIG. 8 shows the flow of processing in the phrase translation unit180 in the embodiment of the present invention;

[0144]FIG. 9 shows the flow of network-mediated phrase translationgeneration processing in the phrase translation unit 180 in theembodiment of the present invention;

[0145] FIGS. 10(a) and 10(b) show an example of the results oftranslation in the document translation unit 120 and the phrasetranslation unit 180 in the embodiment of the present invention. FIG.10(a) shows a translation result in the case of sentence-prioritizedtranslation. FIG. 10(b) shows a translation result in the case of nounphrase-prioritized translation; and

[0146]FIG. 11 shows an example of a hardware configuration of a computer1000 in the embodiment of the present invention.

DESCRIPTION OF SYMBOLS

[0147]10 . . . Translation system

[0148]100 . . . Translation front end system

[0149]110 . . . Translation dictionary recording unit

[0150]115 a, 115 b . . . Categorized dictionary

[0151]117 . . . Translation dictionary

[0152]120 . . . Document translation unit

[0153]125 . . . Extracted phrase recording unit

[0154]127 . . . Phrase receiving unit

[0155]130 . . . Phrase classification unit

[0156]140 . . . Registration phrase recording unit

[0157]145 a, 145 b . . . Category-by-category-basis registration phraserecording file

[0158]160 . . . Dictionary updating server

[0159]170 . . . Translation dictionary recording unit

[0160]175 a, 175 b . . . Categorized dictionary

[0161]177 . . . Translation dictionary

[0162]180 . . . Phrase translation unit

[0163]185 . . . Updating dictionary

[0164]190 . . . Dictionary registration unit

[0165]200 . . . Dictionary management unit

[0166]210 . . . Morphological analysis unit

[0167]220 . . . Phrase extraction unit

[0168]230 . . . Syntactic analysis unit

[0169]240 . . . Document translation generation unit

[0170]250 . . . Document category selection unit

[0171]400 . . . Registration phrase selection unit

[0172]410 . . . Registration destination selection unit

[0173]420 . . . Category-By-Category-Basis appearance frequency

1) A translation system for translating a document, comprising: adictionary management unit for managing a plurality of categorizeddictionaries classified according to predetermined categories; a phraseextraction unit for extracting a noun phrase from said document; aregistration category selection unit for selecting a category on whichsaid extracted noun phrase should be registered among a plurality ofcategories corresponding to said plurality of categorized dictionaries,respectively; a translation unit for translating said noun phrase togenerate a noun phrase translation which is a translation of said nounphrase; and a dictionary registration unit for registering a pair ofsaid noun phrase and said noun phrase translation on said categorizeddictionary corresponding to the category selected by said registrationcategory selection unit. 2) The translation system according to claim 1,further comprising a document category selection unit for selecting thecategory of said document on the basis of the frequencies of use of saidplurality of categorized dictionaries in translation of said document,wherein said registration destination selection unit selects a categoryon which said extracted noun phrase should be registered on the basis ofthe category selected by said document category selection unit. 3) Thetranslation system according to claim 2, wherein said document categoryselection unit selects the category of each of said plurality ofdocuments on the basis of the frequencies of use of said plurality ofcategorized dictionaries in translation of said plurality of documents,and wherein said phrase extraction unit extracts said noun phrase fromsaid plurality of documents, and said registration destination selectionunit selects a category on which said noun phrase should be registered,on the basis of the frequencies of appearance of said noun phrase insaid plurality of documents and the categories of the documents. 4) Thetranslation system according to claim 3, further comprising aregistration phrase selection unit for selecting inhibiting the pair ofsaid noun phrase and said noun phrase translation from being registeredin any one of said plurality of categorized dictionaries if thefrequency with which said noun phrase appears in said plurality ofdocuments is lower than a predetermined lower limit value. 5) Thetranslation system according to claim 2, wherein one of said pluralityof categorized dictionaries is a base dictionary in which words andphrases not classified into any one of the plurality of categoriescorresponding to another plurality of categorized dictionaries areregistered, and said registration destination selection unit has: acategory-by-category-basis appearance frequency computation unit forcomputing the frequency of appearance of said noun phrase with respectto each of said plurality of categories on the basis of the frequenciesof appearance of said noun phrase in said plurality of documents and thecategories of the documents; and a registration destination categoryselection unit for making a selection as to in which one of saidplurality of categorized dictionaries the pair of said noun phrase andsaid noun phrase translation should be registered, on the basis of saidfrequencies of appearance of the noun phrase with respect to saidplurality of categories, wherein said dictionary registration unitregisters the pair of said noun phrase and said noun phrase translationin said base dictionary when said registration destination categoryselection unit selects registration of said noun phrase in said basedictionary. 6) The translation system according to claim 2, wherein saidregistration destination selection unit selects a category on which saidnoun phrase should be registered on the basis of the degrees ofappearance of said noun phrase with respect to the plurality ofcategories corresponding to said plurality of documents. 7) Thetranslation system according to claim 1, wherein said translation unittranslates said noun phrase to generate said noun phrase translation onthe basis of prioritized use of said categorized dictionarycorresponding to the category which is selected by said registrationdestination selection unit and on which said noun phrase should beregistered. 8) The translation system according to claim 1, wherein saidtranslation unit sets a higher priority for use of a grammatical rulefor a translation result as a noun phrase in the case of generating saidnoun phrase translation, in comparison with translation of a portionother than the noun phrase in said document. 9) The translation systemaccording to claim 1, wherein said translation unit has: a translationword generation unit for generating a noun phrase translation candidateas a candidate for said noun phrase translation; a page search unit forsearching pages on a network to find pages containing said noun phrasetranslation candidate; and a noun phrase translation generation unitwhich makes a selection as to whether or not said noun phrasetranslation candidate should be selected as said noun phrase translationon the basis of whether or not any page containing said noun phrasetranslation candidate has been hit. 10) A dictionary updating server forupdating dictionaries for use in translating a document at a terminalmanaging a plurality of categorized dictionaries classified according topredetermined categories, comprising: a noun phrase receiving unit forreceiving a noun phrase extracted from said document from said terminal;a registration category selection unit for selecting a category on whichsaid extracted noun phrase should be registered among a plurality ofcategories corresponding to said plurality of categorized dictionaries,respectively; a translation unit for translating said noun phrasereceived from said terminal to generate a noun phrase translation whichis a translation of said noun phrase; and a dictionary registration unitfor registering a pair of said noun phrase and said noun phrasetranslation on said categorized dictionary corresponding to the categoryselected by said registration category selection unit. 11) A translationsystem for translating a document, having a terminal for updatingdictionaries for use in the translation on the basis of an instructionfrom an external dictionary updating server, wherein said terminalcomprising: a translation dictionary recording unit for storing aplurality of categorized dictionaries classified according topredetermined categories; a phrase extraction unit for extracting a nounphrase from said document; and a document translation unit fortranslating said document by using said plurality of categorizeddictionaries, wherein said dictionary updating server comprising: aregistration category selection unit for selecting a category on whichsaid extracted noun phrase should be registered among a plurality ofcategories corresponding to said plurality of categorized dictionaries,respectively; a translation unit for translating said noun phrase togenerate a noun phrase translation which is a translation of said nounphrase; and a dictionary registration unit for issuing an instruction toregister a pair of said noun phrase and said noun phrase translation tosaid categorized dictionary corresponding to the category selected bysaid registration category selection unit, wherein said translationdictionary recording unit registers the pair of said noun phrase andsaid noun phrase translation on said categorized dictionarycorresponding to the category selected by said registration categoryselection unit on the basis of the registration instruction issued bysaid dictionary registration unit. 12) A program product for atranslation system for translating a document, the program productcontaining program which enables said translation system to function as:a dictionary management unit for managing a plurality of categorizeddictionaries classified according to predetermined categories; a phraseextraction unit for extracting a noun phrase from said document; aregistration category selection unit for selecting a category on whichsaid extracted noun phrase should be registered among a plurality ofcategories corresponding to said plurality of categorized dictionaries,respectively; a translation unit for translating said noun phrase togenerate a noun phrase translation which is a translation of said nounphrase; and a dictionary registration unit for registering a pair ofsaid noun phase and said noun phase translation on said categorizeddictionary corresponding to the category selected by said registrationcategory selection unit. 13) A program product for a dictionary updatingserver for updating dictionaries for use in translating a document at aterminal managing a plurality of categorized dictionaries classifiedaccording to predetermined categories, the program product containingprogram which enables said dictionary updating server to function as: anoun phrase receiving unit for receiving a noun phrase extracted fromsaid document from said terminal; a registration category selection unitfor selecting a category on which said extracted noun phrase should beregistered among a plurality of categories corresponding to saidplurality of categorized dictionaries, respectively; a translation unitfor translating said noun phrase received from said terminal to generatea noun phrase translation which is a translation of said noun phrase;and a dictionary registration unit for registering a pair of said nounphrase and said noun phrase translation on said categorized dictionarycorresponding to the category selected by said registration categoryselection unit. 14) A translation method in a translation system fortranslating a document by using a computer, comprising the steps of: adictionary management with the computer to manage a plurality ofcategorized dictionaries classified according to predeterminedcategories; a phrase extraction with the computer to extract a nounphrase from said document; a registration category selection with thecomputer to select a category on which said extracted noun phrase shouldbe registered among a plurality of categories corresponding to saidplurality of categorized dictionaries, respectively; a translation withthe computer to translate said noun phrase to generate a noun phrasetranslation which is a translation of said noun phrase; and a dictionaryregistration with the computer to register a pair of said noun phraseand said noun phrase translation on said categorized dictionarycorresponding to the category selected in said registration categoryselection step.